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Abstract 

The effect of bootstrapping was studied by examining whether major profile patterns 
were replicated when sample sizes were reduced. Profile patterns estimated from- the original 
sample (N = 645) of WPPSI-in Standardization Data were considered major profiles. For 
bootstrapping, the original sample was reduced to n = 50, n = 25, and n = 20. From each 
reduced sample, profile patterns were extracted and compared to the major profile patterns. 
Then, the bootstrapping technique was applied to the reduced sample and the bootstrap 
correlation matrix was estimated. Using the correlation matrix, profile patterns were estimated 
and they were compared to the major profile patterns. To measure correspondence between the 
major profiles and the estimated profiles, correlation coefficients were computed. The profile 
patterns obtained from reduced samples without bootstrapping were poorly matched with the 
major profiles, whereas the profile patterns from bootstrapped samples were well matched with 
the major profile patterns. The bootstrapping substantially contributed to replicating major 
profiles when sample sizes were severely reduced. 
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Replication of Major Profile Patterns in Structural Equation Modeling: 

Effect of Bootstrapping in a Small Sample 
Se-Kang Kim 

The Psychological Corporation 
San Antonio, Texas 

This paper was designed to explore how much bootstrapping could help replicate major 
profile patterns when sample sizes were reduced from the original sample. Little research has 
been done about how effectively bootstrapping replicates factor structures of a population when 
a small sample is analyzed. It is generally known that estimates from a small sample may not be 
;an appropriate representative of population characteristics of interest because the sample may not 
include all possible aspects of population characteristics. Similarly, factor structures estimated 
from the small sample may not represent true factor structures in a population and moreover, 

when a sample size gets smaller, representation of the population gets even worse. However, 

\ 

collecting a larger sample to fix such a problem is always expensive and time consuming. For 
this reason, the bootstrapping technique was introduced in the study. 

The bootstrapping procedure Creates pseudo replicate datasets by resampling. The 
procedure starts by selecting at random one case from a designated sample for bootstrapping; a 
user'documents the case, returns it to the sample, randomly selects another case, documents its 
score, returns it to the sample, and so on. The step is repeated until the size of the first bootstrap 
sample reaches the same size as the original sample. Since the bootstrapping procedure is based 
on the purely random sampling (i.e., replacement is allowed), this random procedure allows ^ 
possible combinations of data structure that one can think of. For example, there are four 
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observations, A, B, C, and D. If these four obsen^ations are bootstrapped, the 1** bootstrap 
sample could be D, C, B, A. The first observation of the original sample now becomes the last 
observation in the bootstrap sample. The 2““* bootstrap sample could be A, B, A, B. In this 
bootstrap sample, the 1** and 2"“* observations are the same as those of the original sample, but 
the and 4*** are the same as the 1®* and 2““* observations in the original sample. Or one can 
think of (A, A, A, A), ..., (D, D, D, D) since the chance of being selected is all the same (p = .25) 
for each. Again, this random procedure provides all possible combinations of data structure over 
and above the data structure of the original sample. This is the key concept of why the bootstrap 
procedure was implemented for the present study. 

For this study, bootstrapping was not applied to the original sample, but from the original 
sample (N = 645) three samples (e.g., n = 50, n = 25, and n = 20) were randomly selected and 
then bootstrapping was applied to the reduced samples. The beauty of the bootstrap method is 
that as many samples as the user wants can be generated. Efron and Tibshirani (1993, p.52) 
suggest around 200 bootstrap samples for estimating standard errors and about 1000 bootstrap 
samples for empirical confidence intervals, but they do not include any recommended sample 
sizes regarding replicating factor structures in a population (since factor analysis is not popular in 
the Mathematical Field). An arbitrary number, 500 between 200 and 1000 was chosen, and 500 
bootstrap samples were generated from each of the reduced samples. However, there is no 
reason to fix this number for replicating the present study as the number of bootstrap samples 
depends solely on sample characters for factor structure studies. 
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From each of the reduced samples (sample sizes n = 50,n = 25, or n = 20, respectively), 
500 correlation matrices were computed, and then a mean correlation matrix of the 500 matrices 
was estimated. All these procedures were done by PRELIS 2 (Joreskog & Sorbom, 1996). With 
the mean correlation matrix, principal component analysis (PCA), multidimensional scaling 
(MDS), and confirmatory factor analysis (CFA) were conducted. The results of PCA and MDS 
were compared with the results from analyses of the original sample (N = 645). To inspect 
behaviors of profile patterns, coordinates of PCA and MDS from bootstrap correlation matrices 
were superimposed on those of the original sample. To quantify magnitude of correspondence 
between estimated profiles by bootstrapping and major profiles, (Pearson Product Moment) 
correlation coefficients were used. 

Method 

The data used in this study was collected with the Wechsler Preschool and Primary Scale 
of Intelligence - Third Edition (WPPSI-III) Standardization Sample (Wechsler, 2002). The 
subtests used in the study consist of the Similarities (SI), Vocabulary (VC), Word Context (WC), 
Block Design (BD), Matrix Reasoning (MR), and Picture Concepts (PCO). Verbal Similarities 
and Vocabulary subtests are traditionally considered components of Verbal IQ (VIQ), but Word 
Context is a new subtest for VIQ. The last three subtests. Block Design, Matrix Design, and 
Picture Concepts are considered components of Performance IQ (PIQ). The Block Design and 
Matrix Reasoning are traditionally considered components of PIQ, but Picture Concepts subtest 
is a new PIQ subtest. Accordingly, a two-factor model is proposed: the VIQ vs. PIQ factors. 

The data for this study were collected by The Psychological Corporation and are a subset of the 
1534 girls and boys who were collected in the WPPSI-in standardization sample. The age range 
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of the standardization sample is from 2 to 7 years old, but the range for this study is 5 to 7 years 
old. The composition of gender is n = 316 (49%) for girls and n = 329 (51%) for boys. 

For an exploratory Principle Component Analysis (PCA), correlation matrices were used. 

For PAMS, a nonmetric MDS was used to analyze these correlations. Since S^, = (l- r^^') (see 

p.l05, Davison, 1993), where t and t’ refer to subtests, dissimilarity between two tests is 
inversely related with correlation. For PAMS approach, correlations were converted into 
dissimilarities and the dissimilarities were entered for the analysis. Table 1 shows the 
intercorrelations of the six intelligence subtests and Table 2 shows the coordinates from PAMS 
and 2“‘* PC and these coordinates were estimated from the same correlation matrix of N = 645. 



Insert Table 1 and Table 2 about here. 



The sample size sample size N = 645 was radically reduced to n = 50, n = 25, n = 20 to 
examine whether bootstrapping can help replicate the major profile patterns from the reduced 
samples. Each of the reduced samples for bootstrapping was randomly selected from the original 
sample (N = 645) and 500 bootstrap samples were generated from the sample. The sample 
designated for bootstrapping was considered to be a finite bootstrap population. 

Inter-subtest correlation matrices were computed from five hundred bootstrap samples 
and then a mean correlation matrix from the five hundred correlation matrices was estimated. 
This mean-correlation matrix was eventually used for PCA, PAMS, and confirmatory factor 
analyses. A confirmatory factor analysis, based on the same intercorrelations used in the 
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exploratory PCA and PAMS approach, was performed through LISREL 8 (Joreskog & Sorbom, 
1993) on the six subtests of the WPPSI-III. The hypothesized model based on the original 
sample (N = 645) is presented in Figure 1 where circles represent latent variables, and rectangles 
represent observed variables (or subtests). 



Insert Figure 1 about here. 



A two-factor model of “Intelligence” was hypothesized. The first factor is VIQ which 
corresponds to the negative sides in the PAMS model and the second factor is PIQ which 
corresponds to the positive sides in the PAMS model, is hypothesized. Considering characters of 
intelligence, the first factor is left free to be correlated with the second factor. 

Results 

Principal Component Analysis 

To examine a factor stmcture, principle component analyses (PCA) was conducted for all 
samples with two-factor solution. The first principal component was ignored since itrepresents 
general ability or item difficulty factor, and had all substantial positive loadings as expected. 
Therefore, only the second component was examined. Using the bootstrap mean correlation 
matrix, PC was conducted. Table 3 summarized coordinate values of the second principal 
component estimated from NOT bootstrapped samples and BOOTSTRAPPED samples^ 
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Insert Table 3 about here. 



PAMS 

To examine profile patterns, PAMS was conducted for all four samples used in the study. 
Since (1 + K) factor solution but the general factor in PC A corresponds K dimensional solution 
in MDS (see Davison, 1985 or Kim & Davison, 2001), one-dimensional solution was desired. 
Using the same mean correlation matrix used in PC, PAMS was conducted. Table 4 
summarized coordinates of the dimension extracted from non bootstrapped samples as well as 
bootstrapped samples. 



Insert Table 4 about here. 



The table included the one-dimensional solution resulting from a nonmetric MDS 
analysis of the correlations among WPPSI-in Intelligence Subscales. The dimension had two 
sides; negative and positive. The negative side representing VIQ consisted of Similarities, 
Vocabulary, and Word Context. The positive side representing PIQ contained Block Design, 
Matrix Reasoning, and Picture Concepts. The dimension that includes VIQ and PIQ was labeled 
General Ability dimension. 
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Confirmatory Factor Analysis 

Maximum likelihood estimation was employed to estimate all models. The hypothesized 
model (see Figure 1) using the original data (N = 645) was tested without inclusion of error 

2 

coyariances among obseryed yariables (or subtests) and was supported for the model, (8, N = 
645) = 10.84, P-yalue = 0.21, RMSEA = 0.02, (AIC = 36.84 ys. Saturated AIC = 42.00), ECVI = 
0.057, and GFI = 0.99. The model based on the sample data with n = 50, but using the bootstrap 

2 

mean correlation matrix, was tested and supported for the model, ^ (8, « = 50) = 3.61, P-yalue 
= 0.89, RMSEA = 0.00, (AIC = 29.61 ys. Saturated AIC = 42.00), ECVI = 0.69, and GFI = 0.98. 
The model based on the sample data with n = 25, but using the bootstrap mean correlation 

2 

matrix, was tested but not supported for the model, (8, n = 25) = 16.34, P-yalue = 0.04, 
RMSEA = 0.21, (AIC = 42.34 ys. Saturated AIC = 42.00), ECVI = 1.76, and GH = 0.82. The 
model based on the sample data with n = 20, but using the bootstrap mean correlation matrix, 

was tested and supported for the model, (8, n = 20) = 8.57, P-yalue = 0.38, RMSEA = 0.06, 
(AIC = 34.57 ys. Saturated AIC = 42.00), ECVI = 1.82, and GFI = 0.87. Figures 2, 3, and 4 
represent the path diagrams of the confirmatory factor analyses based on the bootstrap mean 
correlation matrices. 



Insert Figures 2, 3, & 4 about here. 
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Relationship Between Major Profile and Reduced Sample Profile 

To inspect profile patterns, coordinates of PCA and MDS were plotted. Before 
bootstrapping was applied, the profiles estimated from reduced samples fell into different 
patterns from the major profile, but after bootstrapping, the profile patterns became similar to the 
major profile pattern. Moreover, to quantify correspondence between major profiles and 
estimated profiles from the samples, correlation coefficients were used. 

First, profile patterns of PCA coordinates were examined. The correlation between the 
major profile and the profile of non-bootstrapped samples (n = 50, /i = 25, and n = 20) was 
computed: Cor(N = 645, /i = 50) = 0.72 , Cor(N = 645, /i = 25) = 0.10 , and 
Cor(N = 645, /i = 20) = 0.75 . None of the correlations were statistically significant and the mean 
correlation was 0.52. Figure 5 represents PC profile patterns of non-bootstrapped samples. 



Insert Figure 5 about here. 



The correlation between the major profile and the profile of bootstrapped samples (B/i = 
50, B/i = 25, and B/i = 20) was examined. Cor(N = 645, Bn = 50) = 0.85 * , 

Cor(N = 645, Bn = 25) = 0.85 * , and Cor(N = 645, Bn = 20) = 0.94 * * . All of the coefficients 
were statistically significant at a = .05 and the mean correlation was 0.88. Figure 6 shows PC 
profile patterns of bootstrapped sample. 
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Insert Figure 6 about here. 



Second, profile patterns of MDS coordinates were examined. The correlation between 
the major profile and the profile of non-bootstrapped samples (n = 50, n = 25, and n = 20) was 
computed: Cor{N = 645, n = 50) = 0.17 , Cor{N = 645, n = 25) = 0.02 , and 
Cpr{N = 645, n = 20) = 0.05 . None of the correlations were statistically significant and the mean 
was 0.08. Figure 7 represents MDS profile patterns of non-bootstrapped samples. 



Insert Figure 7 about here. 



The correlation between the major profile and the profile of bootstrapped samples (Bn = 
50, Bn = 25, and Bn = 20) was examined: Cor{N = 645, Bn = 50) = 0.88 * , 

Cor{N = 645, Bn = 25) = 0.87 * , and Cor{N = 645, Bn = 20) = 0.79”“^ . The first two coefficients 
were statistically significant at a = .05 , but the last one was not. The overall mean was 0.85. 
Figure 8 represents MDS profile patterns of bootstrapped samples. 



Insert Figure 8 about here. 
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Discussion 

The bootstrapping helped replicate the major profile pattern when the sample size was 
tremendously reduced from the original sample. When n = 50, PC and PMS coordinates of non- 
bootstrapped sample had the same direction as those of bootstrapped sample. The negative side 
included all VIQ subtests and the positive side contained all PIQ subtests, and the direction was 
consistent with the direction of the original sample’s (N = 645) coordinates. However, when n 
= 25 and n = 20, the directions of PC and PAMS coordinates of non-bootstrapped samples were 
not consistent with the direction of the original sample coordinates. On the other hand, the 
bootstrapped sample coordinates of the same size (n = 25 or n = 20) were consistent with those 
of the original sample. Therefore, in this case, estimated factor structure of the bootstrapped 
sample can be informative, but the result from non-bootstrapped sample may be misleading. 

To replicate the profile pattern (or factor structure) obtained from the bootstrapped 
samples based on the bootstrap mean matrices, the confirmatory factor analyses were conducted 
and their model fits were examined. The results from the bootstrapped samples n = 50 and n = 
20 provided good model fits: P-value=0.89 and RMSEA=0.00 for n = 50 and P-value=0.38 and 
RMSEA=0.06 for n = 20. However, the results from the bootstrapped sample, n = 25 (e.g., F- 
value=0.04 and RMSEA=0.21), were not as good as the results of the other bootstrapped 
samples. Why? 

There are a couple of explanations. First, the sample from the original data (N = 645) 
was a bad sample that did not include characteristics of the original factor structure. Second, the 
500 bootstrap samples may not be large enough to replicate the original factor structure. To 
examine the first presumption, descriptive statistics of the original data and the sample size n = 
25 were inspected. For N = 645, M(SI) = -0.02, M(VC) = -0.01, M(WC) = 0.01, M(BD) = 0.02, 
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M(MR) = 0.01, and M(PCO) = 0.00 and their standard deviations, 1.00, 1.00, 0.99, 1.00, 1.01, 
and 1.00, respectively. For n = 25, M(SI) = 0.215, M(VC) = 0.142, M(WC) = 0.20, M(BD) = 
0.10, M(MR) = -0.04, and M(PCO) = -0.10 and their standard deviations, 0.96, 1.20, 0.92, 1.02, 
1.20, and 1.05, respectively. This examination shows that central tendency and dispersion are 
different from each other in these two samples (N = 645 and n = 25). 

To examine the second presumption, the number of bootstrap samples increased from 
B=500 (which was the bootstrap number used in the study) to B=1000, B=2000, B=3000, and 
B=5000 and bootstrap mean correlation matrices were estimated from the increased bootstrap 
samples. Note that “B” represents the number of bootstrap samples. Based on the mean 
correlation matrices, confirmatory factor analyses were performed and their model fits were 
examined. 

When B=1000 mean correlation matrix was analyzed, P-value for the chi-square is 0.00 
and RMSEA=0.28. These results did not improve from those of the B=500 analysis (whose P- 
value=0.04 and RMSEA=0.21). When the B=2000 mean correlation was used, P-value=0.15 
and RMSEA=0.14 were obtained. These results showed improvement. When the B=3000 and 
B=5000 mean correlation matrices were analyzed, P-value=0.07 and RMSEA=0.19 and P- 
value=0.09 and RMSEA=0.17 were obtained, respectively. These results were consistent with 
those of the B=2000 correlation analysis. The increased number of bootstrapping did help 
recover the original factor structure. 

Considering the comparison of descriptive statistics (between N = 645 and n = 25) and 
effect of increased number of bootstrap samples, it is interesting to notice that regardless of the 
difference in descriptive statistics between the two samples, the increment of bootstrap sample 
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number resulted in improvement of the model fit. In other words, the bootstrapping effect 
overpowered the distribution of a sample. 

To reaffirm this proposition, all descriptive statistics of n = 50 and n =20 were also 
checked. Their means and standard deviations were not identical to those of iV = 645 . The 
grand mean (6 subtests, N = 645) was 0.00 with a standard deviation of 1.00; for n = 50, the 
grand mean was 0.07 with a standard deviation of 1.03; for n = 25, the grand mean of was 0.09 
with a standard deviation of 1.06; and for n = 20, the grand mean was 0.29 with a standard 
deviation of 1.01. Someone might ask why B=500 worked well to replicate the original factor 
structure for the sample sizes n = 50 and n = 20, but not for n = 25. This question requires 
investigation. Future research is required regarding this matter. 
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Table 1 

WPPSI-in Intelligence Subtests Intercorrelations: Ages From 5 To 7 Years Old (A^ = 645) 





SI 


VC 


WC 


BD 


MR 


PCO 


Similarities 


1.00 


0.66 


0.60 


0.44 


0.42 


0.40 


Vocabulary 


0.66 


1.00 


0.63 


0.41 


0.41 


0.39 


Word Context 


0.60 


0.63 


1.00 


0.41 


0.41 


0.40 


Block Design 


0.44 


0.41 


0.41 


1.00 


0.49 


0.37 


Matrix Reasoning 


0.42 


0.41 


0.41 


0.49 


1.00 


0.40 


Picture Concepts 


0.40 


0.39 


0.40 


0.37 


0.40 


1.00 
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Table 2 

Coordinates from One-Dimensional Nonmetric Solution and from the Second Principal 
Component based on Intercorrelations of Intelligence Subtests 



Subtests 


Dimension 


2nd PC 


Similarities 


-1.00 


-0.33 


Vocabulary 


-1.00 


-0.41 


Word Context 


-1.00 


-0.37 


Block Design 


1.01 


0.47 


Matrix Reasoning 


1.01 


0.51 


Picture Concepts 


0.99 


0.32 
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Table 3 

Coordinates of 2°^^ Principal Component including Bootstrapping Estimates 



Subtests 


TV = 645 


n = 50 


Bn = 50 


n = 25 


Bn = 25 


n = 20 


Bn = 20 


Similarities 


-0.33 


-0.33 


-0.34 


-0.15 


-0.38 


-0.44 


-0.45 


Vocabulary 


-0.41 


-0.29 


-0.25 


0.08 


-0.65 


-0.64 


-0.60 


Word Context 


-0.37 


-0.21 


-0.40 


-0.12 


-0.05 


0.21 


-010 


Block Design 


0.47 


0.09 


0.12 


-0.65 


0.11 


0.31 


0.44 


Matrix Reasoning 


0.51 


0.22 


0.41 


0.23 


0.47 


0.51 


0.39 


Picture Concepts 


0.32 


0.85 


0.69 


0.69 


0.44 


0.04 


0.29 
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Table 4 

Coordinates of MDS including Bootstrapping Estimates 



Subtests 


V = 645 


n = 50 


Bn = 50 


n = 25 


Bn = 25 


n = 20 


Bn = 20 


Similarities 


o 

q 

t 


-0.70 


-1.03 


-0.32 


-0.42 


-0.50 


-1.15 


Vocabulary 


-1.00 


-0.68 


-0.64 


-0.01 


-1.85 


-1.63 


-1.22 


Word Context 


-1.00 


-0.57 


-0.97 


-0.21 


-0.33 


-0.28 


-O.02 


Block Design 


1.01 


-0.11 


0.23 


1.65 


0.75 


1.31 


1.72 


Matrix Reasoning 


1.01 


-0.11 


0.66 


0.49 


0.83 


1.13 


0.48 


Picture Concepts 


0.99 


2.17 


1.75 


1.70 


1.01 


-0.02 


0.18 
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Figure Captions 

Figure 1. Path Diagram for The Model Using the Original Sample (N = 645) 
Figure 2. Path Diagram for The Model Using the Bootstrapped Sample (n = 50) 
Figure 3. Path Diagram for The Model Using the Bootstrapped Sample (n = 25) 
Figure 4. Path Diagram for The Model Using the Bootstrapped Sample {n = 20) 
Figure 5. PC Profile Patterns of Non-Bootstrapped Samples 
Figure 6. PC Profile Patterns of Bootstrapped Samples 
Figure 7. MDS Profile Patterns of Non-Bootstrapped Samples 
Figure 8. MDS Profile Patterns of Bootstrapped Samples 
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